Goto

Collaborating Authors

 euclidean k-clustering


Average Sensitivity of Euclidean k-Clustering

Neural Information Processing Systems

Given a set of $n$ points in $\mathbb{R}^d$, the goal of Euclidean $(k,\ell)$-clustering is to find $k$ centers that minimize the sum of the $\ell$-th powers of the Euclidean distance of each point to the closest center. In practical situations, the clustering result must be stable against points missing in the input data so that we can make trustworthy and consistent decisions. To address this issue, we consider the average sensitivity of Euclidean $(k,\ell)$-clustering, which measures the stability of the output in total variation distance against deleting a random point from the input data.


Average Sensitivity of Euclidean k-Clustering

Neural Information Processing Systems

Given a set of n points in \mathbb{R} d, the goal of Euclidean (k,\ell) -clustering is to find k centers that minimize the sum of the \ell -th powers of the Euclidean distance of each point to the closest center. In practical situations, the clustering result must be stable against points missing in the input data so that we can make trustworthy and consistent decisions. To address this issue, we consider the average sensitivity of Euclidean (k,\ell) -clustering, which measures the stability of the output in total variation distance against deleting a random point from the input data. We first show that a popular algorithm \textsc{ k -means } and its variant called \textsc{ D \ell -sampling} have low average sensitivity. Next, we show that any approximation algorithm for Euclidean (k,\ell) -clustering can be transformed to an algorithm with low average sensitivity while almost preserving the approximation guarantee.